Non-linear Least Squares

ESM 244 2024

Nathaniel Grimes

Bren School of Environmental Science

2024-02-08

Quick Aside

I made this presentation in Quarto using RevealJS. The code is up on Canvas if you want to see more ways Quarto can be used beyond just homework assignments

Pros

  • Easily integrate R code

  • Update figures automatically

  • Living presentation with html features

  • Easier to write math through Latex and MathJax

Cons

  • Define everything in code, no easy powerpoint tools

  • Maybe not as “sexy” as other presentations

Packages to follow along with

library(tidyverse)
#remotes::install_github("lter/lterdatasampler")
library(lterdatasampler)
library(knitr)
library(broom)
library(investr)
library(kableExtra)

What is Non-linear Least Squares?

Remember what Ordinary Least Squares does first

The Fundamental Objective of OLS

  • Best Fit a Line to Data
  • How Does OLS Fit a Line to Data?

Hint:

\(\hat{\beta}=\frac{\sum^n_{i=1}(x_i-\bar{x})(y_i-\bar{y})}{\sum^n_{i=1}(x_i-\bar{x})^2}\)

Remember what Ordinary Least Squares does first


How does OLS fit the line?

  • Minimize squared error (aka residuals)

\(\hat\beta=\min_\beta \sum^n_{i=1}\hat{\epsilon_i}^2=\sum^n_{i=1}(y_i-\beta x_i)^2\)

OLS is simple and powerful

What are some limitations of OLS?

  • Linear relationship between predictor (y) and variables (x)

  • Multicollinearity

  • Errors normally distributed

  • Homoscedasticity

  • Sensitive to outliers

What if we get something like this?

Or this?

In specific applications, accuracy greatly matters!

Nonlinear Least Squares

Apply the same idea of least squares error minimization, but with any function

\[ \begin{aligned} y_i&=f(x_i,\boldsymbol\beta)+\epsilon_i &\text{(1)}\\ \min_{\boldsymbol\beta}&=\sum^n_{i=1}\epsilon_i^2=\sum^n_{i=1}(y_i-f(x_i,\boldsymbol\beta))^2 &\text{(2)} \end{aligned} \]

General idea is very similar, but implementation and use is quite different

How NLS works

No simple analytical solution like in OLS (Solve for \(\hat\beta\))

Iteratively approximate the solution through algorithms

  • Gauss-Newton (Most Common)

  • Levenberg-Marquardt (More flexible)

Approximate the function’s gradient (think derivative), then move along until a convergence criteria is met

\[ |\frac{\overbrace{S^k}^{\text{Previous squared errror}}-\overbrace{S^{k+1}}^{\text{Updated squared error}}}{S^k}|<0.0001 \]

You have already used these algorithms before

fit.variogram uses a Levenberg-Marquardt to find the nugget, sill, and range

Demonstration of Gauss-Newton Algorithm

  • Global minimum at a=2.25

  • Begins at initial guess of a=3.5

  • Step size depends on the 2nd order approximation

  • Keeps going until the green line (first derivative) reaches close to zero

Why should we use NLS?

  1. Less Restrictive

  2. Interpretable

    • Estimated parameters connected to meaninful processes
  3. Parsimonious

    • Compared to polynomial models, GAMs, and other GLMs NLS solves with less data and parameters
  4. Predictive

    • Often more accurate and faster

When to use NLS

Best suited for specific model parameterization given a collection of data

Have a known equation and want to fit parameters

There is no \(R^2\) value to compare across model specifications, but we can still test model performance using AIC or Cross Fold Validation through RMSE (In lab this week!)

Pitfalls (literally) and warnings

NLS is only as good as the underlying model. Bring your brain to the party and make sure the model you’re fitting is appropriate

Follows gradient of steepest descent \(\rightarrow\) local min/max valleys

  • With n-parameters chances of local min/max rises

Requires good initial guesses

Know when to use NLS or another option

  • In this research we ended up using Genetic Algorithms instead as they provide global solutions without having to guess

  • Operationally, the Navy doesn’t have time to guess

Using NLS in R

Implementation Framework

  • Step 1: Identify \(f(x,\beta)\)

    • Use Literature or build your own from theoretical justification
  • Step 2: Build \(f(x,\beta)\) as an R function

  • Step 3: Find initial guesses

  • Step 4: Combine everything and run with nls()

  • Step 5: Evaluate results

Let’s apply NLS to our Female Bisons

knz_bison_age <- knz_bison %>% 
  mutate(animal_age = rec_year - animal_yob) %>% 
  filter(animal_sex=="F")

Bison Restoration needs accurate ecosystem and population models


Fit data from Konza Prairie LTER to inform Sun Prairie Restoration Efforts

Use R Built in functions

df_nls<-nls(formula=   # Model we want to estimate,
            data   # Data we are evaluating,
            start  # Our initial guesses
            control # List of tolerance value, etc
            trace  # Do we want to see convergence
            upper  # Bounds on input parameters
            ... # some other useful stuff )

Step 1: What Model to use?

Martin and Barboza (2020) used a Gompertz model to predict bison mass

\[BM=b1*exp(-exp(-b2*(age-b3)))\]

\(b1\) = asymptotic body mass (pounds)

\(b2\) = instantaneous growth-rate

\(b3\) = age at inflection point years

\(age\) = Independent variable

\(BM\) = Body mass (pounds) Dependent variable

Step 2: Build R function

gompertz<-function(b1,b2,b3,age){
 BM= b1*exp(-exp(-b2*(age-b3)))
return(BM)
}



Note: For the nls function it’s okay to define all parameters like we did. In other optimization tools (e.g. optim) you would want to keep the first input index as a vector if you have multiple choice variables

Step 3: Provide Guesses

df_nls<-nls(animal_weight~gompertz(b1,b2,b3,animal_age),   
            data=knz_bison_age,   
            start=list(b1=?,b2=?,b3=?),
            trace=TRUE )


The initial guesses and data variable also tell nls which variables we are trying to find from our data

Step 3: Provide Guesses

4 methods for providing guesses

  1. Use past parameters from similar studies

  2. Use data to internally define guesses (min, mean, max, etc.)

  3. In 2-D, look at the graphs and estimate

  4. In N-D, combine steps 1-2 then create a start grid to search over

Interactive applied guesses

Copy and paste this code in an R script to run a shiny app.

You might have to install lterdatasampler before running

Link to a static slide in case the app doesn’t work

Code
#remotes::install_github("lter/lterdatasampler") in case you need to install

library(shiny)
library(tidyverse)
library(lterdatasampler)
library(shinyWidgets)

# Make function outside for easier reading
gompertz<-function(b1,b2,b3,age){
  BM= b1*exp(-exp(-b2*(age-b3)))
  return(BM)
}

x=seq(from=0, to=20,length.out=100)

#start UI
ui<-fluidPage(
  setSliderColor(rep("#003660",times=3),seq(1,3)),
  sidebarLayout(
    sidebarPanel(
      sliderInput("bm",
                label="Asymptotic Body Mass (b1)",
                min=0,max=2000,
                value=500),
      sliderInput("ig",
                label="Instantaneous growth rate (b2)",
                min=0,max=2,
                value=1,
                step=.2),
      sliderInput("age_in",
                label="Age inflection (b3)",
                min=0,max=10,
                value=5),
    width=4),
    mainPanel(
      plotOutput("gompertz")
    )
  )
)

server<-function(input,output){
  dft<-reactive({
    #make a dataframe that changes
    data.frame(age=x,weight=gompertz(input$bm,input$ig,input$age_in,x))
  })
  #Graph said dataframe using bison data on top of our model dataframe that changes with inputs
  output$gompertz=renderPlot({knz_bison %>% 
    mutate(animal_age = rec_year - animal_yob) %>% 
    filter(animal_sex=="F") %>% 
    ggplot()+
      geom_point(aes(x=animal_age,y=animal_weight),size=2.5,alpha=0.2,color='purple')+
      theme_minimal()+
      xlab('Age')+
      ylab('Weight')+
      ggtitle("Female Bison from Konza Prairie")+
      theme(axis.title = element_text(size=26),axis.text = element_text(size=20))+
      theme(plot.title = element_text(size=28,hjust=0.5))+
    geom_line(data=dft(),aes(x=age,y=weight),color="#79A540",size=2)
  })
  
}
  
shinyApp(ui=ui,server=server)  

Step 4: Run Model

b_gompertz<-nls(animal_weight~gompertz(b1,b2,b3,animal_age),
                      data = knz_bison_age,
                      start = list(b1=1000,b2=1,b3=0.6),
                      trace = TRUE)
5.291715e+07 (7.61e-01): par = (1000 1 0.6)
3.442663e+07 (1.91e-01): par = (1007.201 0.6930941 0.3444975)
3.320797e+07 (7.02e-03): par = (1009.67 0.7250854 0.2798164)
3.320631e+07 (1.28e-04): par = (1009.764 0.727529 0.2832794)
3.320631e+07 (3.10e-06): par = (1009.756 0.7275973 0.2832921)

Step 5: What did the model find?

tidy(b_gompertz) %>% 
  kable() %>% 
  kable_classic()
term estimate std.error statistic p.value
b1 1009.7560205 1.9031134 530.58110 0
b2 0.7275973 0.0075495 96.37640 0
b3 0.2832921 0.0080643 35.12911 0
glance(b_gompertz) %>% 
  kable() %>% 
  kable_classic()
sigma isConv finTol logLik AIC BIC deviance df.residual nobs
81.12163 TRUE 3.1e-06 -29357.87 58723.74 58749.85 33206307 5046 5049

How well does the model predict?

Model results indicate the lowest possible sum of squared error

model_aug<-broom::augment(b_gompertz)

sum((model_aug$.resid)^2) # Sum of the squared error
[1] 33206307


If we compare different model runs from the trace output we can see this is the smallest sum of squared errors. No other model will get lower this number.

Adding confidence intervals is easy

conf<-as_tibble(predFit(b_gompertz,
            newdata = list(animal_age=age_series),
            interval="confidence"),
            level=0.95) 
conf$age=bison_f_predicted$age_series
head(conf,n=4) %>% 
  kable() %>% 
  kable_classic()
fit lwr upr age
295.4679 290.6938 300.2419 0.0
322.0798 317.5501 326.6096 0.1
348.9703 344.6794 353.2612 0.2
375.9842 371.9130 380.0553 0.3
#plot+geom_ribbon(data=conf...)

Model fits so well, the confidence intervals don’t even show on plot.

Summary

  • Break linear assumption

  • Best used with an underlying model

  • Feed initial guess (How?)

  1. Literature/Theory
  1. From data
  1. Graph
  1. Create starting grids

NLS place in the world

Machine Learning



Non Linear Least Squares



Optimization

NLS falls under the optimization umbrella

Many flavors and varieties of optimization

As general as possible

\[ \begin{aligned} V(x,c)&=\max_c f(x,c) &\text{Subject to} \\ x&\ge0\\ c&\ge0 \\ x&=g(x,c) \end{aligned} \]

Used extensively in economic research, numerical modelling, engineering, and geophysics

Which method to use?

Depends on the question being asked

  • What mathematical form is the optimization equation in?

    • Maximum likelihood estimation is different than quadratic programming
  • Do I need it to be fast or accurate?

How many/form of paramters?

Best to use methods that you understand than ones you don’t

Optimization toolkit highlights

optim/optimx: Workhorse functions in R

quadprog: Used by a GP I advised in 2021 and by the Salmon Stocks GP now

GA: Genetic algorithms are the best global tool that I know of

NLoptr: New project trying to keep syntax of alogrithms the same across computer languages

List of every optimization function with short descriptions

Static Slides in case shiny app doesn’t work

How we could use step 2 and 3 in this case?

Asymptotic body mass \(b1\) implies a max body length we could take the biggest observed female or generally look at the graph

Age inflection point \(b3\) is where the curve starts bending

Instantaneous growth-rate \(b2\) is kind of weird, but you could manipulate a Gompertz model to see how it changes the model